Skip to content

Add ChaisemartinDHaultfoeuille (dCDH) DID_M estimator (Phase 1)#290

Merged
igerber merged 22 commits intomainfrom
dcdh-estimator
Apr 12, 2026
Merged

Add ChaisemartinDHaultfoeuille (dCDH) DID_M estimator (Phase 1)#290
igerber merged 22 commits intomainfrom
dcdh-estimator

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 11, 2026

Summary

  • Adds ChaisemartinDHaultfoeuille (alias DCDH) — Phase 1 of the de Chaisemartin-D'Haultfœuille estimator family. The only modern staggered DiD estimator in the library that handles non-absorbing (reversible) treatments — treatment can switch on AND off over time (marketing campaigns, seasonal promotions, on/off policy cycles, binary fuzzy designs).
  • Implements DID_M from de Chaisemartin & D'Haultfœuille (2020) AER, equivalently DID_1 (horizon l = 1) of the dynamic companion paper (NBER WP 29873).
  • Forward-compatible API: Phase 2 (multi-horizon event study via aggregate/L_max) and Phase 3 (covariate adjustment via controls, group-specific linear trends, HonestDiD) parameters present from day one with NotImplementedError gates pointing at the relevant phase.
  • Ships joiners-only (DID_+) and leavers-only (DID_-) decompositions, single-lag placebo (DID_M^pl) point estimate, optional multiplier bootstrap (Rademacher / Mammen / Webb), TWFE decomposition diagnostic from Theorem 1 of AER 2020 (per-cell weights, fraction negative, sigma_fe, beta_fe), plus a standalone twowayfeweights() helper.
  • Adds generate_reversible_did_data() generator with 7 patterns (single_switch default, joiners_only, leavers_only, mixed_single_switch, random, cycles, marketing).

Methodology references

  • Method: dCDH DID_M estimator (= DID_1 at horizon l = 1 of the dynamic paper)
  • Sources:
    • de Chaisemartin, C. & D'Haultfœuille, X. (2020). Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects. American Economic Review, 110(9), 2964–2996. https://doi.org/10.1257/aer.20181169
    • de Chaisemartin, C. & D'Haultfœuille, X. (2022, revised 2024). Difference-in-Differences Estimators of Intertemporal Treatment Effects. NBER Working Paper 29873. https://www.nber.org/papers/w29873
    • Three paper reviews under docs/methodology/papers/chaisemartin-dhaultfoeuille-{2020-review,2020-appendix-review,dynamic-review}.md (committed in earlier work; the dynamic-paper review is the source for the cohort-recentered analytical variance from Web Appendix Section 3.7.3)
  • Reference implementation parity: R DIDmultiplegtDYN v2.3.3 (CRAN, maintained by the paper authors)

Intentional deviations from R DIDmultiplegtDYN (documented as **Note (deviation from R):** in REGISTRY.md)

Stable-control set definition. Python uses period-based stable definitions (stable_0(t) = any cell with D_{g,t-1} = D_{g,t} = 0, regardless of baseline D_{g,1}), matching the AER 2020 Theorem 3 cell-count notation N_{0,0,t} and N_{1,1,t} literally. R uses cohort-based definitions that additionally require D_{g,1} to match the side, matching the dynamic companion paper's cohort (D_{g,1}, F_g, S_g) framework. The two definitions agree exactly under any of: (a) the panel is pure-direction (joiners-only or leavers-only), (b) no joiner's post-switch state ever overlaps a period when leavers are switching (and vice versa), or (c) the worked example. They diverge by O(1%) on mixed-direction panels with both joiners and leavers when some joiners' post-switch cells could serve as leavers' controls (or vice versa). Parity tests use 1e-4 tolerance for pure-direction scenarios and 2.5% tolerance for mixed-direction scenarios, with the deviation explicitly documented in docs/methodology/REGISTRY.md.

Phase 1 placebo SE. Intentionally NaN with a UserWarning. The dynamic companion paper Section 3.7.3 derives the cohort-recentered analytical variance for DID_l only — not for the placebo DID_M^pl. Phase 2 will add multiplier-bootstrap support for the placebo via the dynamic paper's machinery; until then, placebo_se, placebo_t_stat, placebo_p_value, and placebo_conf_int remain NaN even when n_bootstrap > 0. The placebo point estimate is still computed and exposed via results.placebo_effect. A regression test (test_placebo_bootstrap_unavailable_in_phase_1) pins this contract.

Assumption 11 zero-retention convention. When some period t has joiners but no stable_0 controls (or leavers but no stable_1 controls), Python sets that period's DID_{+,t} (or DID_{-,t}) to zero but retains the switcher count in the N_S denominator, matching the AER 2020 Theorem 3 worked-example arithmetic. R drops the cohort entirely. Per-period flags (did_plus_t_a11_zeroed, did_minus_t_a11_zeroed) and a consolidated UserWarning make this visible to users. The R parity scenarios add never-treated and always-treated control cohorts so neither implementation triggers the convention, eliminating this as a source of disagreement in the parity tests.

Documented matches with R (worth noting because they're substantive choices)

  • drop_larger_lower=True default — drops multi-switch groups before estimation (matches R; required for the analytical variance to be consistent with the point estimate)
  • Singleton-baseline filter (footnote 15 of dynamic paper) — drops groups whose D_{g,1} is unique
  • Cohort-recentered analytical variance from Web Appendix Section 3.7.3 of the dynamic paper (subtracts cohort-conditional means, not a grand mean — the load-bearing variance correctness check that has its own dedicated test)
  • Conservative CI under Assumption 8 (independent groups), exact only under iid sampling — documented as a **Note:** in REGISTRY.md

No silent failures

Every drop / round / fallback emits an explicit warnings.warn() or ValueError. Five distinct warnings/errors fire on the relevant code paths: NaN treatment validation, NaN outcome validation, placebo not-computable, TWFE diagnostic denom == 0, never-switching groups filter from variance, plus the multi-switch and singleton-baseline filters and the consolidated A11 warning.

Validation

Test files added

  • tests/test_chaisemartin_dhaultfoeuille.py — 44 tests covering basic API, forward-compat gates (one per Phase 2/3 parameter), drop_larger_lower filter, A11 zero-retention, NaN handling, bootstrap (Rademacher / Mammen / Webb), results dataclass, and the new test_placebo_bootstrap_unavailable_in_phase_1 regression test
  • tests/test_methodology_chaisemartin_dhaultfoeuille.py — 8 fast + 2 slow Tier 2 methodology tests including the hand-calculable 4-group worked example (asserts DID_M = 2.5, DID_+ = 2.0, DID_- = 3.0 exactly), test_cohort_recentering_not_grand_mean (the load-bearing variance correctness check that catches the Add initial diff-diff library implementation #1 implementation bug), TWFE diagnostic correctness, and large-N recovery (slow)
  • tests/test_chaisemartin_dhaultfoeuille_parity.py — 5 R DIDmultiplegtDYN parity scenarios at horizon l = 1: single_switch_mixed, joiners_only, leavers_only, mixed_single_switch, and the hand-calculable worked example
  • tests/test_prep_dgp_reversible.py — 38 tests for the new generate_reversible_did_data() generator (shape, patterns, reproducibility, true-effect column, switcher-type column, validation errors)

Test files modified

  • tests/conftest.py — added _check_r_dcdh_available(), r_dcdh_available session fixture, and require_r_dcdh per-test fixture parallel to the existing r_available / require_r pattern, with DIFF_DIFF_R=skip env-var override

R parity reference

  • benchmarks/R/generate_dcdh_dynr_test_values.R — mirror generator + 5 scenarios, generates golden values via did_multiplegt_dyn(..., effects=1) from R DIDmultiplegtDYN v2.3.3
  • benchmarks/data/dcdh_dynr_golden_values.json — committed golden values (~73 KB)
  • The R script also exercises a polars r-universe backend that DIDmultiplegtDYN >= 2.x requires
  • The Python parity test fixture skips cleanly when R or DIDmultiplegtDYN is unavailable, so the test suite still runs on dev machines without R installed

Documentation surface

Per docs/doc-deps.yaml for the new chaisemartin_dhaultfoeuille group:

  • docs/methodology/REGISTRY.md — full ## ChaisemartinDHaultfoeuille section with assumption checks, estimator equations, the four **Note:** blocks (conservative CI, drop_larger_lower, A11 zero-retention, singleton-baseline filter), the **Note (deviation from R DIDmultiplegtDYN):** block, edge cases, requirements checklist
  • docs/api/chaisemartin_dhaultfoeuille.rst — autoclass directives + autofunction for chaisemartin_dhaultfoeuille() and twowayfeweights(), with usage examples
  • docs/api/index.rst — toctree entry; docs/api/prep.rstgenerate_reversible_did_data autofunction
  • docs/choosing_estimator.rst — Quick Reference row, decision-flowchart question, Survey Compatibility Matrix row (-- in all four columns), Detailed Guidance subsection
  • docs/practitioner_decision_tree.rst — new "Reversible Treatment (On/Off Cycles)" section with business-language framing
  • docs/llms.txt and docs/llms-full.txt — bullet + ~100-line section
  • README.md — features bullet, alias table entry, full usage section
  • CHANGELOG.md### Added entry under ## [Unreleased]
  • ROADMAP.md — Phase 1 items (1a–1i) flipped from "Not started" to "Shipped"; Phase 2 / Phase 3 retained

Test count

96 dCDH tests passing locally (44 API + 8 methodology fast + 2 methodology slow [deselected by default] + 5 R parity + 38 generator – overlap = 95 deselectable + 1 placebo regression = 96 total). The hand-calculable 4-group worked example produces DID_M = 2.5 exactly matching R DIDmultiplegtDYN.

No tutorial in this PR

Per the multi-phase plan, a single comprehensive dCDH tutorial ships in Phase 3 after covariate support is added. Phase 1 deliberately ships no tutorial.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

igerber and others added 3 commits April 11, 2026 15:50
Phase 1 of the de Chaisemartin-D'Haultfoeuille estimator family — the only
modern staggered DiD estimator in the library that handles non-absorbing
(reversible) treatments. Treatment can switch on AND off over time
(marketing campaigns, seasonal promotions, on/off policy cycles).

Implements DID_M from de Chaisemartin & D'Haultfoeuille (2020) AER,
equivalently DID_1 (horizon l=1) of the dynamic companion paper
(NBER WP 29873).

Ships:
- Headline DID_M point estimate with cohort-recentered analytical SE
  from Web Appendix Section 3.7.3 of the dynamic companion paper
- Joiners-only (DID_+) and leavers-only (DID_-) decompositions
- Single-lag placebo DID_M^pl (Theorem 4 of AER 2020); analytical
  placebo SE deferred to Phase 2 (NaN with explicit warning)
- Optional multiplier bootstrap clustered at group level
  (Rademacher / Mammen / Webb)
- TWFE decomposition diagnostic from Theorem 1 of AER 2020 (per-cell
  weights, fraction negative, sigma_fe, beta_fe), exposed both as
  estimator field and standalone twowayfeweights() helper
- Multi-switch group filtering (drop_larger_lower=True default,
  matches R DIDmultiplegtDYN); singleton-baseline filter (footnote 15
  of dynamic paper); never-switching groups filter from variance —
  all with explicit warnings (no silent failures)
- Forward-compatible fit() signature: Phase 2 (multi-horizon event
  study, aggregate, L_max) and Phase 3 (covariate adjustment via
  controls, group-specific linear trends, HonestDiD) parameters
  raise NotImplementedError with phase pointers
- generate_reversible_did_data() generator with 7 patterns (single_switch,
  joiners_only, leavers_only, mixed_single_switch, random, cycles, marketing)
- Validated against R DIDmultiplegtDYN v2.3.3 at horizon l=1 via
  tests/test_chaisemartin_dhaultfoeuille_parity.py (5 scenarios)

Methodology notes documented in REGISTRY.md:
- Conservative CI under Assumption 8 (independent groups), exact only
  under iid sampling
- A11 zero-retention convention (paper-faithful, with consolidated warning)
- drop_larger_lower default and inconsistency warning when False
- Singleton-baseline filter (footnote 15)
- Phase 1 placebo SE intentionally NaN (deferred to Phase 2)
- Note (deviation from R DIDmultiplegtDYN): Python uses period-based
  stable-control sets matching AER 2020 Theorem 3 literally; R uses
  cohort-based controls from the dynamic paper. Agree exactly on
  pure-direction panels and the worked example; differ by ~1% on
  mixed-direction panels with both joiners and leavers.

Tests: 95 passing locally (44 API, 8 fast methodology + 2 slow Tier 2,
38 generator, 5 R parity). Hand-calculable 4-group worked example
produces DID_M = 2.5 exactly matching R.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…s API

Methodology / inference:
- Bootstrap-updated overall/joiners/leavers inference now goes through
  safe_inference() instead of inline t_stat = effect / se. Matches the
  project anti-pattern rule and keeps NaN behavior consistent with the
  rest of the library.

Placebo doc/code alignment:
- Phase 1 placebo bootstrap is deferred to Phase 2 (the dynamic companion
  paper Section 3.7.3 derives the cohort-recentered analytical variance
  for DID_l only; the placebo's influence-function representation is not
  yet implemented). The bootstrap path covers DID_M, DID_+, and DID_-
  only.
- Updated the source warning text, the placebo_inputs comment, README,
  CHANGELOG, REGISTRY, choosing_estimator.rst, llms-full.txt to drop
  the misleading "set n_bootstrap > 0 for a placebo SE today" promise
  and state clearly that placebo inference fields stay NaN even with
  bootstrap enabled.
- Added test_placebo_bootstrap_unavailable_in_phase_1 to pin this
  contract so future contributors do not silently widen the bootstrap
  surface without wiring up the documented Phase 2 derivation.

twowayfeweights() public API:
- Added ValueError validation for missing outcome/group/time/treatment
  columns (was treatment-only).
- Updated README, docs/api/chaisemartin_dhaultfoeuille.rst, and
  docs/llms-full.txt examples to pass outcome="outcome" so the
  documented usage actually runs.

set_params():
- Re-runs the same __init__ validation rules so invalid parameter
  combinations cannot slip in after construction.

Tests:
- 96 passing locally (was 95). The new placebo test runs alongside
  TestBootstrap.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Final doc inconsistency caught on re-review: practitioner_decision_tree.rst
still said "set n_bootstrap > 0 for a bootstrap-based placebo SE today" while
README, REGISTRY, choosing_estimator, and llms-full had been updated to
reflect that placebo bootstrap is deferred to Phase 2 and placebo inference
fields stay NaN even when n_bootstrap > 0.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: fc881fb78d806e089150f0eba1d584886f845a5e


Overall Assessment

⚠️ Needs changes

This PR adds a new estimator and touches methodology-critical code paths, but I found three unmitigated P1 issues in the estimation/inference logic. Static review only: I was not able to run the new test suite in this shell because the local Python runtime here is missing project deps such as pandas.

Executive Summary

  • The Phase 1 DID_M / DID_M^pl implementation is using per-cell observation counts n_gt as Theorem 3/4 weights, while the new Methodology Registry defines N_{a,b,t} as counts of (g,t) cells. That silently changes the estimand on individual-level inputs with uneven cell sizes.
  • The analytical SE path does not implement the registry’s full U^G_g = (Lambda^G_{g,l=1} * Y_g).sum() construction; it explicitly substitutes a first-switch-only “Phase 1 simplification,” and the bootstrap reuses that same vector.
  • The singleton-baseline filter is applied to the estimation sample, although the registry documents it as a variance-stage filter. Under the documented period-based stable-control deviation, that can change DID_M, not just the variance.
  • The new methodology-reference files linked from the roadmap/doc-deps are not present in the repo, and the added tests do not actually validate SE parity or the claimed “cohort recentering vs grand mean” distinction.
  • I did not find additional security issues in the changed files.

Methodology

Code Quality

No additional material findings beyond the methodology issues above.

Performance

No material findings in the changed performance-sensitive paths beyond the correctness issues above.

Maintainability

No additional material findings beyond the documentation/test gaps below.

Tech Debt

No mitigating TODO.md entry covers the P1 issues above, so they are not deferrable under the project’s review policy.

Security

No material security findings in the changed files.

Documentation/Tests

  • Severity: P2. Impact: the PR adds methodology-reference links that do not resolve in this repo, and the new tests do not actually validate the claimed inference correctness. ROADMAP.md and docs/doc-deps.yaml now point reviewers to three docs/methodology/papers/chaisemartin-dhaultfoeuille-*.md files, but those files are not present in this checkout. Separately, the supposed “load-bearing” recentering test only asserts that overall_se > 0, and the R parity suite checks point estimates only, not SEs. That leaves the incorrect SE path above effectively unguarded. Concrete fix: commit the referenced methodology-review files, make doc-deps/path resolution fail CI when missing, turn test_cohort_recentering_not_grand_mean into a real λ-vs-grand-mean regression test, and add SE parity assertions for pure-direction scenarios. Refs: ROADMAP.md:L125-L129, docs/doc-deps.yaml:L261-L275, tests/test_methodology_chaisemartin_dhaultfoeuille.py:L162-L251, tests/test_chaisemartin_dhaultfoeuille_parity.py:L122-L168

Path to Approval

  1. Rework the Theorem 3/4 aggregation to use transition-cell counts rather than n_gt observation counts, or explicitly document and test a weighted estimand as a deviation.
  2. Replace the current first-switch-only SE shortcut with the full Lambda^G_{g,l=1} influence-function implementation from Section 3.7.3, and use that same vector for the bootstrap path.
  3. Move the singleton-baseline filter out of the point-estimate sample unless you also document it as a point-estimate deviation and show parity/hand-worked evidence for the altered estimator.
  4. Commit the missing methodology-review markdown files and strengthen the new tests so they actually fail on grand-mean recentering, wrong λ construction, and SE mismatches versus R on pure-direction designs.

Addresses three methodology issues in the dCDH Phase 1 implementation:

1. Cell-count weighting (P1): the per-period DID and placebo formulas
   were computing N_{a,b,t} as the sum of within-cell observation counts
   instead of the count of (g, t) cells. This silently changed the
   estimand on individual-level inputs with uneven cell sizes. Fixed by
   replacing n_curr[mask].sum() with int(mask.sum()) and using simple
   .mean() over cells instead of n_curr-weighted averages. The fix
   matches the AER 2020 Theorem 3 cell-count notation literally and
   matches R DIDmultiplegtDYN's default behavior on cell-aggregated
   input. Renamed the results field n_switcher_obs -> n_switcher_cells
   for accuracy. Test data uses balanced 1-obs-per-cell panels so the
   regression test_cell_count_weighting_unbalanced_input is needed to
   guard the unbalanced-input path.

2. Full Lambda^G_{g,l=1} influence function (P1): the analytical SE
   path was using a "first-switch only" shortcut where each switching
   group's U^G_g came from its own switch period only. This omitted the
   periods where switching groups serve as stable controls for other
   cohorts. Replaced with _compute_full_per_group_contributions, which
   sums role-weighted outcome differences (joiner +1, stable_0
   -n_10/n_00, leaver -1, stable_1 +n_01/n_11) across all periods. The
   bootstrap path inherits the corrected vectors automatically. SE
   parity vs R DIDmultiplegtDYN on pure-direction scenarios narrowed
   from ~18% to ~3%; pure-direction parity tests now assert SE within
   5% rtol.

3. Singleton-baseline filter at variance stage only (P1): footnote 15
   of the dynamic paper says singleton-baseline groups have no cohort
   peer for the cohort-recentered variance. The previous code applied
   the filter as a point-estimate filter (dropping the groups before
   the per-period DIDs), which under Python's documented period-based
   stable-control interpretation could change DID_M when a unique-
   baseline always-treated or never-treated group would otherwise serve
   as a valid stable control. Fixed by computing the singleton-baseline
   group set in fit() and passing it to the variance helper, which
   excludes those groups from the cohort enumeration and U vector
   only. The cell DataFrame retains them. Updated
   TestSingletonBaselineFilter to assert the variance-only semantics
   and the warning text.

Documentation:
- doc-deps.yaml and ROADMAP.md no longer link to gitignored
  paper review files (REGISTRY.md is the canonical source).
- REGISTRY.md updated with cell-count notation, the full IF formula,
  the variance-only singleton-filter scope, the new SE parity gap
  numbers (3% vs the prior 18%), and a clearer period-vs-cohort
  deviation note.

Tests:
- test_cohort_recentering_not_grand_mean now actually computes both
  formulas on the same data and asserts they differ materially. The
  prior version only checked overall_se > 0.
- test_cell_count_weighting_unbalanced_input pins the cell-count
  contract on a deliberately unbalanced individual-level panel.
- test_parity_joiners_only and test_parity_leavers_only now assert
  SE parity at 5% rtol (was previously skipped).
- 97 dCDH tests passing (96 + the new cell-count regression).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0d70f4d0deac473176a1c6fd467e06eb7aa810e4


Overall Assessment

⚠️ Needs changes

Static review only: this shell is missing pytest, pandas, and numpy, so I could not run the new dCDH test suite locally. Assessment is based on two unmitigated P1 findings in methodology-critical paths.

Executive Summary

  • The three prior P1 blockers from the last review appear resolved: Theorem 3 now uses cell counts, the full Lambda^G_{g,l=1} influence-function path is present, and the singleton-baseline filter is now variance-only.
  • P1 [Newly identified]: the analytical variance path still assumes a complete baseline observation and observed adjacent treatment pairs, but the estimator accepts ragged panels; missing periods can be misread as switches or crash cohort-key construction.
  • P1 [Newly identified]: standalone twowayfeweights() bypasses the estimator’s documented validation contract and can silently drop/round malformed treatment/outcome cells.
  • P2: joiner/leaver metadata is mislabeled and partially miscomputed, so summaries and to_dataframe(level="joiners_leavers") can report the wrong sample-size quantities.
  • P2: the parity suite still skips without live R even though committed JSON golden values are already available, which weakens regression coverage for a methodology-critical estimator.

Methodology

Affected methods: dCDH DID_M / DID_+ / DID_- variance path from Web Appendix Section 3.7.3, and the standalone TWFE diagnostic from AER 2020 Theorem 1. Re-review note: the prior findings on cell-count weighting, full IF construction, and variance-only singleton-baseline filtering look addressed in diff_diff/chaisemartin_dhaultfoeuille.py:L511-L554, diff_diff/chaisemartin_dhaultfoeuille.py:L1088-L1096, and diff_diff/chaisemartin_dhaultfoeuille.py:L1237-L1346.

  • Severity: P1 [Newly identified]. Impact: the estimator now tolerates missing (group, time) cells in the point-estimate path, but the cohort/variance path still treats raw NaN entries in D_mat as if they were observed treatment states. D_mat is built with NaN gaps for missing cells, the point-estimate path correctly gates transitions on present, but _compute_cohort_recentered_inputs() ignores that guard and detects first switches from raw D_mat[g, t] != D_mat[g, t-1]; a group missing the first global period can also hit int(np.nan) when cohort keys are built. Step 7 also uses each group’s first observed row, not guaranteed D_{g,1}, for singleton-baseline identification. This can silently misclassify cohorts and distort overall_se / joiners_se / leavers_se, or fail with an opaque error on late-entry groups. Concrete fix: after building N_mat, either explicitly reject groups missing the first global period or drop them with a user-facing warning, and compute first_switch_idx / switch_direction only on adjacent observed pairs (N_mat[:, t] > 0 and N_mat[:, t-1] > 0). Add regression tests for a missing-baseline group and an interior-gap group. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:L524-L528, diff_diff/chaisemartin_dhaultfoeuille.py:L568-L583, diff_diff/chaisemartin_dhaultfoeuille.py:L1078-L1083, diff_diff/chaisemartin_dhaultfoeuille.py:L1443-L1467.

  • Severity: P1 [Newly identified]. Impact: the new public twowayfeweights() helper does not honor the dCDH validation rules documented in the registry and enforced in fit(). fit() explicitly rejects NaN treatment/outcome values, rejects non-binary treatment, and warns before majority-rounding within-cell mixed treatment. twowayfeweights() does none of that; it only coerces treatment numeric, then aggregates with mean()/count() and thresholds d_gt >= 0.5. That means NaN outcomes/treatments can be silently dropped by the aggregation, non-binary treatment can be silently thresholded into {0,1}, and within-cell varying treatment is rounded with no warning, yielding misleading Theorem 1 diagnostics. Concrete fix: factor the validation + cell-aggregation logic into a shared helper used by both fit() and twowayfeweights(), with identical NaN, binary-treatment, and within-cell-rounding behavior, and add tests for each case. Refs: docs/methodology/REGISTRY.md:L470-L473, docs/methodology/REGISTRY.md:L581-L583, diff_diff/chaisemartin_dhaultfoeuille.py:L392-L456, diff_diff/chaisemartin_dhaultfoeuille.py:L1765-L1809.

Code Quality

Performance

  • No material findings in the changed performance-sensitive paths.

Maintainability

  • Severity: P3. Impact: summary() still reports never-switching groups under “Groups dropped before estimation”, but the methodology registry now says n_groups_dropped_never_switching is backward-compat metadata and no longer represents an actual exclusion. That makes the results summary internally inconsistent with the documented estimator contract. Concrete fix: rename that block to something like “Group filter / metadata counts” and label the never-switching count as “reported, not dropped”. Refs: diff_diff/chaisemartin_dhaultfoeuille_results.py:L412-L429, docs/methodology/REGISTRY.md:L476-L476.

Tech Debt

  • No separate tech-debt finding, but TODO.md does not currently track either P1 above, so they are not mitigated under the project’s review rules. Ref: TODO.md:L51-L98.

Security

  • No material security findings in the changed files.

Documentation/Tests

  • Severity: P2. Impact: the parity tests already load committed JSON golden values, but every scenario still takes require_r_dcdh, so the whole regression suite skips when live R / DIDmultiplegtDYN is unavailable. That leaves the main methodology-regression checks inactive on ordinary Python-only environments even though the assertions themselves do not need R once the goldens are committed. Concrete fix: keep the R availability check only for the golden-value generation workflow, and let the JSON-backed parity assertions run whenever benchmarks/data/dcdh_dynr_golden_values.json exists. Refs: tests/test_chaisemartin_dhaultfoeuille_parity.py:L40-L58, tests/test_chaisemartin_dhaultfoeuille_parity.py:L122-L184, tests/conftest.py:L107-L178.

  • Severity: P3. Impact: the docs are internally inconsistent about the dCDH paper-review markdowns. ROADMAP.md now says the primary papers are not committed in-repo, while CHANGELOG.md says three dCDH paper reviews were committed under docs/methodology/papers/. Concrete fix: align the changelog with the roadmap, or add the promised review files. Refs: ROADMAP.md:L127-L129, CHANGELOG.md:L22-L22.

Path to Approval

  1. Make the Section 3.7.3 variance/cohort path robust to ragged panels: either reject groups missing the baseline period with a clear ValueError, or base baseline/switch detection on observed adjacent pairs from N_mat instead of raw NaN comparisons. Add regression tests for a late-entry group and for an interior missing period.
  2. Make twowayfeweights() reuse the same validation and cell-aggregation contract as fit(): reject NaN treatment/outcome, reject non-binary treatment before aggregation, and emit the same warning on within-cell mixed treatment. Add one test for each case.

Addresses two new P1s, two P2s, and two P3s from CI re-review:

P1: ragged panel handling
- Add fit() Step 5a validation: reject groups missing the first global
  period with a clear ValueError listing offenders, and drop groups with
  interior period gaps with an explicit UserWarning. Eliminates the
  silent NaN propagation in _compute_cohort_recentered_inputs that
  could crash the cohort enumeration with int(np.nan) or misclassify
  late-entry groups' baselines.
- Fix singleton-baseline computation in fit() Step 7 to read the
  validated first global period explicitly instead of using
  groupby.first() which returned the first OBSERVED row per group.
- Add defensive presence-gating in _compute_cohort_recentered_inputs:
  the helper now refuses to run if N_mat[:, 0] has any zero entries
  (a fit() validation regression tripwire), and the first-switch
  detection loop only counts transitions between adjacent OBSERVED
  periods. Even with fit() validation in place, the helper is now
  safe to call directly.
- Add 2 regression tests: test_missing_baseline_period_raises_value_error
  and test_interior_gap_drops_group_with_warning.

P1: twowayfeweights validation refactor
- Extract _validate_and_aggregate_to_cells helper enforcing the dCDH
  validation contract (NaN treatment / NaN outcome / non-binary
  treatment all raise ValueError; within-cell varying treatment
  emits UserWarning before majority rounding).
- Both fit() and twowayfeweights() now call the helper. Single source
  of truth for the validation rules, no drift between the two public
  entry points.
- Add 4 regression tests for twowayfeweights() validation:
  test_twowayfeweights_rejects_nan_treatment,
  test_twowayfeweights_rejects_nan_outcome,
  test_twowayfeweights_rejects_non_binary_treatment,
  test_twowayfeweights_warns_on_within_cell_rounding.

P2: joiner/leaver metadata
- Fix n_joiner_cells = int(n_10_t_arr.sum()) (was count_nonzero counting
  PERIODS, not cells). Same for n_leaver_cells.
- Compute n_joiner_obs and n_leaver_obs as actual observation counts
  (sum of n_gt over the joiner/leaver cells across periods), not as
  cell totals. For balanced one-obs-per-cell panels they equal
  n_*_cells; for individual-level inputs with multiple obs per cell
  they can be larger. Update results dataclass docstrings.

P2: parity tests run on JSON without R
- Decouple golden_values fixture from require_r_dcdh. Tests now run
  whenever the JSON file exists. R is only needed to regenerate the
  JSON via benchmarks/R/generate_dcdh_dynr_test_values.R.
- Verified by running DIFF_DIFF_R=skip pytest tests/test_chaisemartin_dhaultfoeuille_parity.py
  — all 5 parity scenarios PASS (was previously skipping entirely).

P3: summary() label rename
- Rename "Groups dropped before estimation:" to "Group filter /
  metadata counts:". Label never-switching as "(reported, not dropped)".
  Reflects the Round 2 change where never-switching groups participate
  in the variance via stable-control roles.

P3: CHANGELOG/ROADMAP consistency
- Remove the CHANGELOG bullet that claimed three paper review files
  were committed under docs/methodology/papers/. Replace with a
  reference to REGISTRY.md as the canonical methodology surface
  (matching the ROADMAP wording).

Tests: 103 dCDH passing (97 + 6 new). Worked example DID_M = 2.5 still
exact. Pure-direction R parity tests still pass at 1e-4 / 5% rtol.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: df7a0a389e78d3b82eeb3455948a9adc8d74ca12


Overall Assessment

✅ Looks good

Static review only: I could not run the new dCDH suite in this environment because pytest, numpy, and pandas are not installed.

Executive Summary

Methodology

Methodology cross-check: the PR’s Phase 1 framing is consistent with the cited source material and with the registry’s current-scope definition for DID_M / DID_1; the previously flagged methodology blockers noted above also appear resolved in the implementation. (aeaweb.org)

Code Quality

  • Severity: P2. Impact: to_dataframe(level="joiners_leavers") exposes one n_obs column with different units by row: DID_M uses switcher-cell count, while DID_+ / DID_- use raw observation counts summed over switching cells. That makes the table easy to misread programmatically and breaks the semantic contract of a single count field. Concrete fix: split this into separate n_cells and n_obs columns, or rename the current field so all rows use the same unit consistently. Refs: diff_diff/chaisemartin_dhaultfoeuille_results.py:L645, diff_diff/chaisemartin_dhaultfoeuille.py:L796.

Performance

  • No material findings in the changed performance-sensitive paths.

Maintainability

  • No material maintainability finding beyond the documentation/contract drift noted below.

Tech Debt

  • No separate tech-debt blocker in this diff. The remaining issues are low-severity contract/documentation cleanup, not silent correctness bugs.

Security

  • No material security findings in the changed files. I did not see secrets, unsafe subprocess additions, or new network-facing behavior.

Documentation/Tests

  • Severity: P3. Impact: the results-object docs are stale relative to the implemented Phase 1 contract. DCDHBootstrapResults still says placebo bootstrap outputs are populated when available, but Phase 1 intentionally never surfaces placebo bootstrap inference; n_groups_dropped_never_switching still says never-switchers are excluded from variance, while the registry/code now treat that field as backward-compat metadata only. Concrete fix: update those docstrings to match current runtime behavior. Refs: diff_diff/chaisemartin_dhaultfoeuille_results.py:L45, diff_diff/chaisemartin_dhaultfoeuille_results.py:L237, docs/methodology/REGISTRY.md:L476, docs/methodology/REGISTRY.md:L554, diff_diff/chaisemartin_dhaultfoeuille.py:L952.
  • Severity: P3. Impact: a couple of new comments overstate what the fixtures/docs guarantee. The R golden-value script says set.seed() and NumPy default_rng() produce identical draws, and the reversible DGP doc says Assumption 5 is “automatically satisfied” for binary treatment even though the PR explicitly relies on dropping multi-switch groups for those paths. These are comment-level issues, but they can mislead future maintainers about parity and identification assumptions. Concrete fix: reword both comments to describe structural mirroring rather than RNG identity, and to say the default single-switch path is A5-safe while multi-switch patterns are stress tests for the drop filter. Refs: benchmarks/R/generate_dcdh_dynr_test_values.R:L37, diff_diff/prep_dgp.py:L1877.

…trings)

P2: split joiners_leavers DataFrame into n_cells + n_obs columns
- to_dataframe(level="joiners_leavers") previously had a single n_obs
  column with mixed semantics by row (DID_M used switcher cell count;
  DID_+/DID_- used raw observation counts). Two columns with consistent
  units across all rows: n_cells (count of switching (g, t) cells) and
  n_obs (sum of n_gt over the same cells). DID_M row uses union of
  joiner + leaver cells. Updated test_to_dataframe_joiners_leavers to
  pin the new contract.

P3: stale docstrings on results object
- DCDHBootstrapResults class docstring now states explicitly that
  placebo bootstrap fields ALWAYS remain None in Phase 1 (the previous
  wording said they were "populated when available"). Per-field
  docstrings for placebo_se / placebo_ci / placebo_p_value now point
  back to the class-level note.
- n_groups_dropped_never_switching docstring now reflects the Round 2
  full-IF fix: never-switching groups participate in the variance via
  stable-control roles and the field is reported for backwards
  compatibility only — no actual exclusion happens.
- n_groups_dropped_singleton_baseline docstring clarifies the
  variance-only filter scope (cell DataFrame retains them as
  period-based stable controls).

P3: misleading R-script + prep_dgp comments
- benchmarks/R/generate_dcdh_dynr_test_values.R: clarified that the
  Python and R generators mirror each other STRUCTURALLY (same pattern
  logic, same FE/effect/noise model), not at the RNG level. R's
  set.seed and NumPy's default_rng use different RNGs. Parity tests
  load the R script's golden-value JSON so both sides operate on
  byte-identical input regardless of how it was originally generated.
- prep_dgp.py generate_reversible_did_data: clarified that the default
  single_switch pattern is A5-safe by construction (every group has at
  most one transition). Other patterns (random/cycles/marketing) ARE
  allowed to violate A5 and exist primarily as stress tests for the
  drop_larger_lower=True filter. The cohort-recentered variance
  formula is derived under A5, which is why drop_larger_lower defaults
  to True.

Tests: 103 dCDH passing (no new tests; the existing
test_to_dataframe_joiners_leavers was strengthened to assert the new
n_cells / n_obs contract).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 787eae20a2aab0a52701ab1d6f3ea88ed449e895


Overall Assessment

Needs changes

Static review only: pytest, numpy, and pandas are unavailable in this environment, so I could not execute the new dCDH suite.

Executive Summary

  • The prior re-review items I spot-checked look addressed in the current diff: shared validation in twowayfeweights(), the joiners_leavers count contract, and the previously stale dCDH docstrings/comments.
  • P1 [Newly identified]: the analytical variance path can collapse to overall_se = 0.0 on valid panels where every exact (D_{g,1}, F_g, S_g) cohort is a singleton. The PR’s own 4-group worked example appears to hit that case, while the committed R golden file records overall_se = 1.8027756377.
  • P1 [Newly identified]: the placebo path silently applies A11 zero-retention when placebo switchers lack eligible stable controls, but it does not expose the warning/flags that the main estimator exposes.
  • The documented period-vs-cohort control-set deviation in REGISTRY.md does not mitigate the SE issue; the registry explicitly treats the 4-group worked example as a case where that deviation should not matter.
  • The new tests pin ATT parity on the worked example, but they still do not pin its SE or the placebo A11 warning path, which is why the above issues can escape.

Methodology

Code Quality

  • No separate code-quality finding beyond the methodology defects above.

Performance

  • No material performance findings in the changed paths reviewed.

Maintainability

  • No separate maintainability finding beyond the methodology/test contract issues below.

Tech Debt

  • Severity: P3 informational. Impact: I did not find any TODO.md entry that tracks either methodology issue above, so they remain unmitigated under the project’s deferment rules and still count toward the assessment. Concrete fix: resolve these in code in this PR; only any leftover non-blocking cleanup should be deferred into TODO.md afterward. Refs: TODO.md:L51-L90

Security

  • No material security findings in the changed files reviewed. I did not see secrets, unsafe subprocess additions, or new network-facing behavior.

Documentation/Tests

Path to Approval

  1. Fix the analytical SE path so valid panels with singleton exact (D_{g,1}, F_g, S_g) cohorts do not silently collapse to overall_se = 0.0; if the variance is genuinely unidentified after the intended paper-consistent filtering, surface NaN plus an explicit warning instead.
  2. Make placebo A11 handling explicit by either adding placebo A11 flags plus a consolidated warning or by marking the placebo unavailable when the required placebo control sets are missing.
  3. Add regression coverage for both cases: assert the worked-example overall_se/CI against the committed golden JSON, and add a placebo panel that exercises missing placebo controls on one side.

Addresses two new P1s + a P2 from CI re-review:

P1: degenerate-cohort SE collapses to 0.0
- _plugin_se now returns NaN when sum(U_centered**2) <= 0 (in addition
  to the existing n==0 and divisor<=0 guards). This catches the
  case where every variance-eligible group forms its own
  (D_{g,1}, F_g, S_g) cohort, so cohort recentering produces an
  identically-zero centered IF vector. Returning NaN rather than 0.0
  prevents the silently-implies-infinite-precision failure mode.
- fit() now detects this case after the analytical SE call and emits
  a UserWarning explaining that the variance is unidentified, the
  point estimate is still valid, and the bootstrap path inherits the
  same degeneracy on the same data.
- Worked example test now asserts overall_se is NaN with the warning,
  alongside the existing DID_M = 2.5 exact assertion.

P1: placebo A11 silent zeroing
- _compute_placebo now mirrors the main path's A11 distinction:
  "no joiners" stays a natural zero (no flag), while "joiners but no
  3-period stable_0 controls" sets the placebo period contribution
  to zero AND appends a warning string to placebo_a11_warnings.
  Symmetric for leavers/stable_1.
- _compute_placebo's return signature widened from
  Optional[Tuple[float, bool]] to Optional[Tuple[float, bool, List[str]]]
  to expose the A11 warnings to the caller.
- fit() unpacks the new tuple and surfaces a consolidated
  "Placebo (DID_M^pl) Assumption 11 violations" warning when any
  placebo period was zeroed by the A11 contract. Mirrors the main
  DID path's warning format.
- New test_placebo_a11_violation_emits_warning constructs a panel
  where placebo joiners exist but no placebo stable_0 controls do
  (4-group T=3 with 2 joiners + 2 always-treated controls), asserts
  the placebo A11 warning fires.

REGISTRY.md: two new Note blocks
- The degenerate-cohort SE contract (NaN + warning, deviation from R)
- The placebo A11 zero-retention contract (mirrors main DID path)

Tests: 105 dCDH passing (was 103, added 2 new). The existing
test_hand_calculable_4group_3period_joiners_and_leavers wraps fit()
in a warnings.catch_warnings(simplefilter("ignore")) block since the
degenerate-cohort warning now fires on this panel; the dedicated
test_worked_example_se_is_unidentified_with_warning asserts the
warning explicitly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1e05c9d86fb2cb12451f888fc0e123c3064f2594


Overall Assessment

⚠️ Needs changes

Static review only. I could not execute the new dCDH suite here because pytest, numpy, and pandas are unavailable in this environment.

Executive Summary

  • The prior re-review blockers I spot-checked look addressed: degenerate singleton-cohort analytical SE now surfaces NaN plus a warning instead of 0.0, and placebo A11 zero-retention now emits a consolidated warning, with both behaviors documented and tested.
  • P1 [Newly identified]: ChaisemartinDHaultfoeuille.fit() now errors on groups missing the first global period and drops groups with interior period gaps before computing DID_M, DID_+, DID_-, or DID_M^pl, but the dCDH registry/docs do not document that sample restriction even though the PR also presents DIDmultiplegtDYN as the reference implementation and that manual explicitly supports unbalanced panels. diff_diff/chaisemartin_dhaultfoeuille.py:L534-L607, docs/methodology/REGISTRY.md:L468-L478, docs/methodology/REGISTRY.md:L544-L569, docs/api/chaisemartin_dhaultfoeuille.rst:L18-L43, tests/test_chaisemartin_dhaultfoeuille.py:L462-L507. citeturn0view0turn0view1
  • The documented Phase 1 placebo-SE=NaN contract and the documented degenerate-cohort overall_se=NaN behavior are not blockers.
  • P3: several changed docs/comments still describe never-switching groups as filtered from variance or as true “filter counts,” while the implementation and main registry note now say they are retained and that field is metadata only.

Methodology

Code Quality

  • No separate findings beyond the methodology item above.

Performance

  • No material performance findings in the changed paths reviewed.

Maintainability

  • No separate maintainability findings beyond the documentation drift noted below.

Tech Debt

  • Severity: P3 informational. Impact: I did not find a TODO.md entry tracking the P1 above, so it remains unmitigated under the project’s deferment rules. Concrete fix: resolve or document it in this PR rather than deferring it. TODO.md:L51-L95

Security

  • No material security findings in the changed files reviewed.

Documentation/Tests

Path to Approval

  1. Decide the intended dCDH unbalanced/ragged-panel contract and document it in docs/methodology/REGISTRY.md: if Phase 1 requires first-period observation plus consecutive periods, say so explicitly and label it as a deviation/limitation relative to DIDmultiplegtDYN; otherwise, implement the reference package’s unbalanced-panel handling.
  2. Mirror that contract in the public API/README text that currently advertises DIDmultiplegtDYN parity, so users do not expect R-like behavior on late-entry or interior-gap panels.
  3. Add one regression test that pins the remaining unspoken branch of the contract, e.g. whether terminal missingness / late exit is retained or rejected, so the documented behavior is fully specified.

Documents the dCDH Phase 1 ragged-panel handling contract that was
implemented in Round 3 but not surfaced in any user-facing doc:

- Balanced baseline required: groups missing the first global period
  raise ValueError with offending IDs
- Interior period gaps: groups dropped with UserWarning
- Terminal missingness (early exit / right-censoring): retained,
  contributes from observed periods only via the per-period present
  guard at three sites in the variance computation

This is a Phase 1 limitation relative to R DIDmultiplegtDYN, which
supports unbalanced panels with documented missing-treatment-before-
first-switch handling. Added a new **Note (deviation from R)** block
in REGISTRY.md, mirrored in the API rst, README, and CHANGELOG so
users do not expect R-like behavior on late-entry or interior-gap
panels.

Also fixes doc drift on never-switching groups: after the Round 2
full-IF fix they participate in the variance via stable-control roles
and are no longer filtered, but CHANGELOG/README/REGISTRY checklist
and a methodology test comment still described them as filtered. The
n_groups_dropped_never_switching field stays as backwards-compat
metadata only (already documented correctly in the dataclass docstring
since Round 2).

Adds test_terminal_missingness_retained as a regression test pinning
the contract: a group with periods [0, 1, 2] in a 5-period panel is
retained in results.groups, the fit completes without error, and
the per-period DIDs are populated.

Files modified:
- docs/methodology/REGISTRY.md (assumption check + new deviation Note + checklist)
- docs/api/chaisemartin_dhaultfoeuille.rst (Phase 1 panel requirements section)
- README.md (Note block + n_groups_dropped_never_switching field row split)
- CHANGELOG.md (panel requirement sub-bullet + line 17 never-switching fix)
- tests/test_chaisemartin_dhaultfoeuille.py (test_terminal_missingness_retained)
- tests/test_methodology_chaisemartin_dhaultfoeuille.py (fix test comment)

Test count: 105 -> 106 (one new regression test).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 11, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cf639d44a87b2f2c322284bd3a289578c473b546


Overall Assessment

⚠️ Needs changes

Static review only. I could not execute the added dCDH tests in this runner because pytest and numpy are not installed here, although the project declares them and CI installs them in normal jobs. pyproject.toml:L48-L57, .github/workflows/rust-test.yml:L191-L199

Executive Summary

Methodology

The previous ragged-panel methodology blocker appears resolved; I do not see a remaining finding there. docs/methodology/REGISTRY.md:L477-L477, docs/api/chaisemartin_dhaultfoeuille.rst:L52-L72

Code Quality

No additional findings beyond the public cluster no-op. The changed dCDH inference path consistently routes through safe_inference(). diff_diff/chaisemartin_dhaultfoeuille.py:L920-L942, diff_diff/chaisemartin_dhaultfoeuille.py:L1023-L1038

Performance

No material findings in the changed paths I reviewed.

Maintainability

Tech Debt

  • Severity: P3. Impact: I did not find a dCDH entry in TODO.md that would mitigate the cluster issue, so it remains unmitigated under the project’s deferment rules. Concrete fix: resolve or explicitly document/gate the behavior in this PR rather than deferring it. TODO.md:L51-L97

Security

No material security findings in the changed files reviewed.

Documentation/Tests

Path to Approval

  1. Choose a Phase 1 cluster contract and make it explicit in code: either reject any non-None cluster with a clear error, or implement cluster-aware analytical/bootstrap inference end-to-end.
  2. If cluster is intentionally forward-compatible only, add an explicit dCDH **Note:** in docs/methodology/REGISTRY.md and mirror that contract in the public docs so the behavior is not silent.
  3. Add a regression test that pins the chosen cluster behavior, e.g. ChaisemartinDHaultfoeuille(cluster="state").fit(...) should either raise a clear Phase 1 message or exercise documented cluster-aware inference.

…nguage

Resolves three findings from the prior CI review:

P1 — cluster parameter was a public no-op. The constructor exposed
`cluster: Optional[str] = None` and stored it on `self.cluster`, but
neither `fit()` nor `_compute_dcdh_bootstrap()` ever read it, so
`cluster="state"` silently produced the same group-level inference as
`cluster=None`. The Phase 1 contract is now: dCDH always clusters at
the group level (via the cohort-recentered influence function and the
multiplier bootstrap), and any non-None cluster value raises
`NotImplementedError` at construction time. The same gate fires from
`set_params`. Added `test_cluster_parameter_raises_not_implemented`
covering all four entry points (`__init__`, `set_params`, the
`cluster=None` happy path, and the convenience function).

P3 — TWFE diagnostic was running on the post-Step-5a sample but the
inline comment claimed "FULL pre-filter cell dataset" and the
standalone `twowayfeweights()` function used the truly pre-filter
cell. On ragged panels with interior gaps, the two paths diverged.
Fixed by reordering: the TWFE diagnostic now runs as Step 5a (was
5b) BEFORE the ragged-panel validation, which is now Step 5b (was
5a). The blocks are independent — the diagnostic just reads `cell`
while the ragged-panel block mutates it. Both API surfaces now
operate on the same `_validate_and_aggregate_to_cells()` output.

P3 — API rst step 3 said "Filters singleton-baseline groups" which
read as a point-estimate sample drop. After Round 3, singleton-
baseline groups are excluded from the variance computation only
(they remain in the point-estimate sample as period-based stable
controls). Fixed the language to match.

Documentation:
- REGISTRY.md gets a new `**Note (Phase 1 cluster contract):**`
  block in the dCDH section explaining the always-group-level
  semantics and the construction-time NotImplementedError gate.
- API rst step 3 reflects the variance-only singleton-baseline scope.
- The dCDH `cluster` parameter docstring now describes the Phase 1
  contract instead of "Currently unused — analytical SEs are always
  at the group level via the cohort-recentered plug-in."

Test counts: 106 -> 107 (one new cluster gate regression test).

Files modified:
- diff_diff/chaisemartin_dhaultfoeuille.py (cluster gate in __init__
  and set_params, docstring update, TWFE diagnostic block reorder
  with renumbered Step 5a/5b labels)
- docs/methodology/REGISTRY.md (cluster contract Note + rename
  Step 5a -> Step 5b in the ragged-panel deviation Note)
- docs/api/chaisemartin_dhaultfoeuille.rst (singleton-baseline
  language fix in step 3 of the overview)
- tests/test_chaisemartin_dhaultfoeuille.py
  (test_cluster_parameter_raises_not_implemented in
  TestForwardCompatGates; rename Step 5a -> Step 5b in three
  ragged-panel test docstrings)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 83cc093c82faeaac67b9dadceeefbc3dfe10b5e9


Overall Assessment

✅ Looks good

This re-review clears the prior P1 blocker: cluster is no longer a silent no-op in Phase 1, and I did not find a new unmitigated P0/P1 in the changed dCDH estimator or its supporting files. Static review only; I could not execute the added tests here because pytest is not installed in this runner.

Executive Summary

Methodology

Affected methods: Phase 1 dCDH DID_M, DID_+, DID_-, DID_M^pl, plus the standalone Theorem 1 twowayfeweights() diagnostic.

No unmitigated P0/P1 findings. The changed implementation matches the registry on the core Theorem 3 / Theorem 4 formulas, the Section 3.7.3 cohort-recentered variance, and the explicitly documented deviations from R, so I am not treating those documented differences as defects (docs/methodology/REGISTRY.md:L480, docs/methodology/REGISTRY.md:L517, docs/methodology/REGISTRY.md:L565, docs/methodology/REGISTRY.md:L569, diff_diff/chaisemartin_dhaultfoeuille.py:L785, diff_diff/chaisemartin_dhaultfoeuille.py:L1506).

Code Quality

No material findings. The changed dCDH inference path uses safe_inference() rather than inline t-stat / p-value / CI logic, so I did not find new partial-NaN or inline-inference anti-patterns (diff_diff/chaisemartin_dhaultfoeuille.py:L950, diff_diff/utils.py:L152).

Performance

No material findings from static review. I did not benchmark in this runner.

Maintainability

Tech Debt

  • Severity: P3 (tracked in TODO.md). Impact: the broader gap that allowed the .txt AI guide to drift is already tracked: .txt documentation is outside current smoke validation, so this does not block the PR under the repo’s deferment rules (TODO.md:L93, docs/llms-full.txt:L238).
    Concrete fix: no blocker for this PR; follow the existing TODO by extending doc-snippet or consistency checks to docs/*.txt.

Security

No security findings in the changed files reviewed.

Documentation/Tests

Static only here: I could not run the added tests because pytest is unavailable in this runner. The changed suite itself does cover the key edge cases from the checklist, including missing baseline, interior gaps, terminal missingness, A11 zero-retention, placebo bootstrap deferral, and R golden-value parity (tests/test_chaisemartin_dhaultfoeuille.py:L509, tests/test_chaisemartin_dhaultfoeuille.py:L528, tests/test_chaisemartin_dhaultfoeuille.py:L556, tests/test_chaisemartin_dhaultfoeuille.py:L690, tests/test_chaisemartin_dhaultfoeuille.py:L922, tests/test_chaisemartin_dhaultfoeuille_parity.py:L123).

Two P3 doc-drift cleanups from the prior CI review.

P3 — Bootstrap helper docstring drift. The dCDH bootstrap mixin
documented the re-aggregation divisor as a "count of contributing
groups" with the formula `W @ u_centered / n_groups_target`, but
the live code (and the registry) use switching-cell denominators
(`N_S`, joiner cells, leaver cells) per Theorem 3 of AER 2020. The
numbers were correct; only the docstring drifted. Renamed the public
mixin parameter from `n_groups_overall` to `divisor_overall` and
updated the docstring to describe the divisor as the switching-cell
count from the Theorem 3 weighting formula. The joiners/leavers
tuple description in the docstring also says `(u_centered, divisor,
original_effect)` instead of `(u_centered, n_groups, original_effect)`
to match. The fit() call site at line 1041 passes `divisor_overall=N_S`
under the new name.

P3 — `docs/llms-full.txt` line 238 said the dCDH `cluster` parameter
was "Reserved for future use," which is stale after Round 7's
NotImplementedError gate. Updated to "Phase 1: must be None; non-None
raises NotImplementedError" to match REGISTRY.md and the constructor
contract.

No behavior change; both fixes are purely textual / parameter
naming. 107 tests still pass; black, ruff clean.

Files modified:
- diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
  (rename n_groups_overall -> divisor_overall, expanded docstring
  describing the cell-count divisor and the formula `W @ u_centered
  / divisor`)
- diff_diff/chaisemartin_dhaultfoeuille.py
  (call site at line 1041 updates the keyword argument to match)
- docs/llms-full.txt
  (cluster comment matches the Phase 1 contract)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8cafae13ad3d7d1516c7a198b9331730dd5414fb


Overall Assessment

⚠️ Needs changes — one unmitigated P1 finding.

Executive Summary

  • Previous blocker resolved: cluster is now rejected in both __init__ and set_params, matching the Phase 1 contract.
  • Affected methodology here is the dCDH DID_M / placebo surface plus the AER 2020 TWFE decomposition diagnostic, with analytical variance mapped to the intertemporal DiD working paper. (aeaweb.org)
  • Severity P1 [Newly identified]: fitted twfe_* outputs are computed on the pre-filter cell sample, while overall_att, results.groups, and inference use the post-filter sample. When interior-gap or multi-switch filters fire, the diagnostic is not actually from the “same fit” / “same data” as documented.
  • That deviation is not documented in docs/methodology/REGISTRY.md and is not tracked in TODO.md, so it remains unmitigated.
  • Static review only: this runner does not have numpy, pandas, or pytest, so I could not execute the added tests.

Methodology

Affected methodology: the dCDH DID_M / placebo estimator family and the TWFE decomposition diagnostic introduced by de Chaisemartin and D’Haultfoeuille, plus the dynamic-paper variance mapping. (aeaweb.org)

Beyond that sample-contract issue, I did not find another unmitigated methodology defect in the changed DID_M, placebo, or cohort-recentered inference paths.

Code Quality

No findings. I did not see a new inline-inference or partial-NaN regression; the changed estimator path routes inference through diff_diff/utils.py:L152.

Performance

No separate findings from static review. The blocker above also does unnecessary TWFE work on rows later dropped, but the correctness/contract problem is the part that needs action.

Maintainability

No separate findings.

Tech Debt

No separate deferrable finding. The P1 above is not tracked under the TODO.md “Tech Debt from Code Reviews” section, so it is not mitigated by project deferment rules.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Make the fitted twfe_* results use the same filtered cell sample as overall_att and results.groups, or explicitly split pre-filter vs post-filter diagnostics and warn when they differ.
  2. Add regression coverage for both filter paths that change the estimation sample: one interior-gap drop case and one drop_larger_lower=True multi-switch-drop case.
  3. Update the dCDH docstring, results docstring, API docs, and estimator-selection docs so their “same fit” / “same data” wording matches the implemented contract.

Documents the dCDH TWFE diagnostic sample contract that Round 7's
swap left implicit. The fitted results.twfe_* values are computed on
the FULL pre-filter cell sample (matching the standalone
twowayfeweights() function), NOT on the post-filter estimation
sample used by overall_att / results.groups / inference fields. The
existing user-facing wording said "TWFE on the same data" /
"diagnostic from the same fit" — phrases that naturally read as
"same data as overall_att" — which contradicted the post-Round-7
behavior. This commit:

1. Adds a new `**Note (TWFE diagnostic sample contract):**` block in
   REGISTRY.md enumerating all three sample-shaping filters
   (interior-gap, multi-switch, singleton-baseline) and explicitly
   carving singleton-baseline as variance-only (no fitted-vs-overall_att
   mismatch, so no warning).

2. Rewrites the `twfe_diagnostic` parameter docstring in
   chaisemartin_dhaultfoeuille.py to describe the pre-filter contract
   and the divergence warning.

3. Rewrites the twfe_weights / twfe_fraction_negative / twfe_sigma_fe
   / twfe_beta_fe field docstrings in the results dataclass to clarify
   they describe the FULL pre-filter cell sample, with a pointer to
   the REGISTRY contract Note.

4. Adds a `UserWarning` from `fit()` whenever the user requested the
   TWFE diagnostic AND any of the interior-gap or multi-switch filters
   dropped groups. The warning explains the divergence with explicit
   counts and points at REGISTRY for the rationale. The warning fires
   regardless of whether the diagnostic itself succeeded or hit the
   rank-deficient fallback (the plan-review correctly flagged that the
   `twfe_diagnostic_payload is not None` guard would swallow the rare
   rank-deficient + filtered-panel intersection — dropped that guard).

5. Updates docs/api/chaisemartin_dhaultfoeuille.rst and
   docs/choosing_estimator.rst to replace "from the same fit" with
   "computed on the data you pass in (pre-filter)".

6. Adds three regression tests in TestTwowayFeweightsHelper:
   - test_twfe_pre_filter_contract_with_interior_gap_drop: panel with
     a dropped interior-gap group, asserts fitted twfe_* matches
     standalone, estimation sample is smaller, and the divergence
     warning fires with the expected counts.
   - test_twfe_pre_filter_contract_with_multi_switch_drop: panel with
     an injected multi-switch crosser, similar assertions.
   - test_twfe_no_divergence_warning_on_clean_panel: negative test
     asserting NO divergence warning fires on a clean panel
     (hard-codes pattern="single_switch" to close a future footgun).

7. Fixes the stale "Step 5a guarantees..." comment at line 712 to
   "Step 5b guarantees..." (post-Round-7 the ragged-panel validation
   is Step 5b, not Step 5a). Independent cleanup; bundled because
   it's in the same file and the same topic.

This resolution preserves Round 7's standalone-vs-fitted parity
(both APIs use the pre-filter cell sample) and addresses Round 9's
P1 about the documentation contract. Both reviewers' concerns are
now satisfied: the standalone and fitted produce identical numbers
on the same input, AND users see an explicit warning when filters
make the fitted sample diverge from the dCDH estimation sample.

Test counts: 107 -> 110 (three new sample-contract regression
tests). Black, ruff clean.

Files modified:
- docs/methodology/REGISTRY.md
  (new TWFE sample contract Note enumerating all three filters)
- diff_diff/chaisemartin_dhaultfoeuille.py
  (twfe_diagnostic param docstring, n_groups_dropped_interior_gap
  tracking, divergence warning at Step 6b, stale comment fix)
- diff_diff/chaisemartin_dhaultfoeuille_results.py
  (twfe_weights / twfe_fraction_negative / twfe_sigma_fe /
  twfe_beta_fe field docstrings)
- docs/api/chaisemartin_dhaultfoeuille.rst (wording fix)
- docs/choosing_estimator.rst (wording fix)
- tests/test_chaisemartin_dhaultfoeuille.py (3 new tests + 1
  parity test comment update)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 63edb3d9041ef37a53b58a34b6de32539ecfb2bd


Overall Assessment

Needs changes. One unmitigated P1 finding.

Executive Summary

Methodology

Code Quality

No unmitigated findings beyond the bootstrap inference-surface issue above.

Performance

No separate performance findings from static review.

Maintainability

No separate maintainability findings from the changed code.

Tech Debt

No separate deferrable finding. The bootstrap inference-surface mismatch is not tracked in TODO.md, so it remains unmitigated. See TODO.md:L51-L110.

Security

No findings.

Documentation/Tests

  • Severity P2. Impact: the new bootstrap tests only verify that bootstrap_results is populated and that placebo bootstrap stays unavailable; they do not assert that the top-level public fields (overall_p_value, overall_conf_int, joiner/leaver analogues, event_study_effects[1]) or user-facing renderers (summary(), to_dataframe()) actually reflect bootstrap inference. That gap is why the P1 above would pass CI. Separately, the README currently describes the top-level overall_* fields as analytical, which conflicts with the runtime behavior once bootstrap is requested. Concrete fix: add a regression that compares the top-level inference fields and rendered summaries/tables against bootstrap_results when n_bootstrap > 0, and update the README / API docs / result docstrings to state exactly which fields switch to bootstrap inference. Refs: tests/test_chaisemartin_dhaultfoeuille.py:L867-L955, README.md:L1204-L1214, docs/api/chaisemartin_dhaultfoeuille.rst:L196-L206.

Path to Approval

  1. In diff_diff/chaisemartin_dhaultfoeuille.py:L1108-L1122, propagate bootstrap p-values and CIs from br into the top-level overall_*, joiners_*, and leavers_* result fields instead of recomputing them with safe_inference().
  2. Add a regression test that fits with n_bootstrap > 0 and asserts the public inference surface (overall_*, joiners_*, leavers_*, event_study_effects[1]) matches bootstrap_results.
  3. Add user-facing coverage that summary() and to_dataframe() surface the corrected bootstrap inference, and update the dCDH docs so the semantics of the top-level fields are explicit when bootstrap is enabled.

Fixes a P1 where the dCDH bootstrap branch silently replaced the
multiplier-bootstrap percentile p-values and CIs with normal-theory
recomputations from the bootstrap SE. The bootstrap helper computes
the per-target SE / CI / p-value triples correctly on the
DCDHBootstrapResults object, but fit() was only copying the SE and
then calling safe_inference() which returns normal-theory p-values
and CIs. The user requested multiplier bootstrap inference and got
a hybrid (bootstrap SE + normal-theory p/CI) — the public surface
fields, event_study_effects[1], summary(), to_dataframe(),
is_significant, and significance_stars all reflected the
normal-theory recomputations instead of the bootstrap inference.

The fix: in the fit() bootstrap branch, propagate br.overall_p_value
and br.overall_ci directly to the top-level overall_p / overall_ci
(and joiners/leavers analogues). Keep the t-stat from
safe_inference()[0] since percentile bootstrap doesn't define an
alternative t-stat semantic, and that satisfies the project
anti-pattern rule (never compute t = effect/se inline).

Library precedent: imputation.py:790-805, two_stage.py:778-787, and
efficient_did.py:1009-1013 all propagate bootstrap p/CI to the
public surface while keeping a SE-derived t-stat. dCDH was the only
modern bootstrap-enabled estimator that didn't follow this pattern.

Documentation updates:
- New `**Note (bootstrap inference surface):**` block in REGISTRY.md
  documenting the propagation contract, the rationale for the
  SE-based t-stat, and the placebo carve-out (placebo bootstrap
  remains deferred to Phase 2).
- Inference-method switch paragraph added to the
  ChaisemartinDHaultfoeuilleResults class docstring `Notes` section.
- README.md row updated to clarify "cohort-recentered analytical SE
  by default; multiplier-bootstrap percentile inference when
  n_bootstrap > 0".
- API rst Multiplier bootstrap example now shows that
  results.overall_p_value and results.overall_conf_int reflect the
  bootstrap inference (not just bootstrap_results.overall_*).

Regression test added: test_bootstrap_p_value_and_ci_propagated_to_top_level
asserts results.overall_p_value == bootstrap_results.overall_p_value,
overall_conf_int == bootstrap_results.overall_ci, the joiners/leavers
analogues, event_study_effects[1] reflects bootstrap, the t-stat is
the SE-derived value, summary() doesn't crash, and to_dataframe()
returns the bootstrap-derived numbers. This pins the contract so
the silent inference-method swap can't regress.

Test counts: 110 -> 111 (one new bootstrap propagation regression
test). The existing TestBootstrap tests at lines 855-955 only
asserted that bootstrap_results was populated and SE was finite,
which is why the silent swap passed CI for nine commits.

Files modified:
- diff_diff/chaisemartin_dhaultfoeuille.py
  (fit() bootstrap branch propagates p/CI from br instead of
  recomputing via safe_inference)
- diff_diff/chaisemartin_dhaultfoeuille_results.py
  (class docstring `Notes` section gains the inference-method
  switch paragraph)
- docs/methodology/REGISTRY.md
  (new bootstrap inference surface Note)
- README.md (field-description row clarification)
- docs/api/chaisemartin_dhaultfoeuille.rst
  (Multiplier bootstrap example reflects the new propagation)
- tests/test_chaisemartin_dhaultfoeuille.py
  (test_bootstrap_p_value_and_ci_propagated_to_top_level)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes:

P1 — Within-cell-varying treatment now raises ValueError instead of
silently rounding to majority. Phase 1 dCDH requires binary treatment
to be constant within each (group, time) cell; fractional d_gt values
(from individual-level data where some units in a cell are treated and
others are not) indicate a fuzzy design that Phase 1 does not support.
The previous behavior (UserWarning + majority-round) silently mutated
switcher/control membership before Theorem 3 arithmetic, changing the
estimand without the user's knowledge. The ValueError lists the
affected cells and points users at pre-aggregation. The "binary fuzzy
designs" claim has been removed from README, CHANGELOG, REGISTRY, and
choosing_estimator.rst. Both fit() and twowayfeweights() share the
same _validate_and_aggregate_to_cells() rejection via the existing
shared helper.

Tests:
- test_twowayfeweights_warns_on_within_cell_rounding renamed to
  test_twowayfeweights_rejects_within_cell_varying_treatment (now
  asserts ValueError instead of UserWarning)
- test_fit_rejects_within_cell_varying_treatment added (same panel
  via the fit() entry point)

CI fix — test_recovery_joiners_only_n200 was failing on arm64 with
seed 43 (assert 1.5 <= 1.276 — CI coverage assertion failed). Changed
to a point-estimate proximity assertion (abs(overall_att - 1.5) < 0.5)
which is stable across architectures and seeds. CI coverage checks are
inherently stochastic and require many replications to be reliable;
point-estimate proximity is the right assertion for a single-seed
large-N recovery test.

P3 — Fixed stale comment at line 1039 that said "in Phase 1 we
approximate [placebo SE] using the same plug-in formula" when the
actual behavior is intentionally NaN. Updated to match the warning
text and the REGISTRY placebo SE Note.

Test counts: 111 -> 112. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a8b161c5fa1e713c331ec7c77690336c9df40b1e


Overall Assessment

⚠️ Needs changes.

Highest unmitigated severity is P1. This re-review is based on static inspection of the changed files; I could not run the new dCDH tests on this runner because pytest, numpy, and pandas are not installed.

Executive Summary

Methodology

  • Severity P3 informational. Impact: I did not find an unmitigated source-material mismatch in the changed estimator math. The previously raised mixed-cell-treatment issue is fixed, and the remaining substantive Phase 1 departures I checked, including placebo SE deferral and the balanced-baseline/no-interior-gap panel restriction, are explicitly documented in REGISTRY.md, so they are not defects under the review rubric. At the paper level, the cited source material is consistent with the feature’s intended scope: the AER paper motivates an alternative to potentially negatively weighted TWFE, and the dynamic companion paper explicitly studies non-absorbing, lagged-treatment settings. (aeaweb.org) Concrete fix: None. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:L201-L216, docs/methodology/REGISTRY.md:L470-L475, docs/methodology/REGISTRY.md:L555-L575

Code Quality

Performance

No findings from static review of the changed files.

Maintainability

Tech Debt

No separate finding. I did not find a dCDH-specific mitigation entry under TODO.md:L51-L97 that would downgrade the P1 above.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: the changed dCDH tests pin adjacent public contracts like cluster gating and bootstrap-surface propagation, but I did not find a corresponding regression for fit(..., rank_deficient_action="error") on the fitted TWFE path, which is why the current P1 was able to slip through. Concrete fix: add a dCDH regression test with a deliberately rank-deficient FE design asserting that fit() raises under "error" and remains non-fatal only under "warn"/"silent". Refs: tests/test_chaisemartin_dhaultfoeuille.py:L330-L371, tests/test_chaisemartin_dhaultfoeuille.py:L974-L1035
  • I could not execute the dCDH tests on this runner because pytest, numpy, and pandas are unavailable.

Path to Approval

  1. Honor rank_deficient_action="error" in ChaisemartinDHaultfoeuille.fit() by re-raising rank-deficiency failures from _compute_twfe_diagnostic() instead of downgrading them to warning + skip.
  2. Add a regression test that constructs a rank-deficient TWFE diagnostic design and asserts fit(..., twfe_diagnostic=True, rank_deficient_action="error") raises, while "warn"/"silent" preserve the intended non-fatal behavior.
  3. Update the stale dCDH docstrings so they match the current Phase 1 contract: fractional within-cell treatment raises ValueError; it is not rounded.

@igerber igerber removed the ready-for-ci Triggers CI test workflows label Apr 12, 2026
P1: the blanket except Exception around the TWFE diagnostic call in
fit() Step 5a swallowed ALL exceptions, including the ValueError that
solve_ols raises when rank_deficient_action="error". This broke the
public-parameter contract: a user requesting strict failure on a
rank-deficient TWFE design would get a successful fit with the
diagnostic silently omitted instead. The standalone twowayfeweights()
already honored the parameter correctly; only the fitted path was
broken.

Fix: in the except block, re-raise ValueError when
rank_deficient_action=="error" so the user's strict-failure request
is honored. Other exceptions (genuinely non-fatal diagnostic failures)
are still downgraded to a warning + skipped diagnostic.

P3: fixed two stale docstrings that still described the old "warning
+ majority rounding" behavior for fractional within-cell treatment
(the _validate_and_aggregate_to_cells step 5 docstring and the
fit() group parameter docstring). Both now correctly describe the
Phase 1 ValueError rejection.

Regression test: test_rank_deficient_action_error_raises_on_fitted_twfe
in TestForwardCompatGates. Uses a minimal 1-group 2-period panel
(2 cells < 3 FE columns) to trigger the underdetermined-system
ValueError. Asserts rank_deficient_action="error" raises, and
rank_deficient_action="warn" does NOT raise the TWFE-specific error.

Test counts: 112 -> 113. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b5e184781eeffd3391783549fee8ec3b23abcf90


Overall Assessment
⚠️ Needs changes.

Highest unmitigated severity is P1. This re-review resolves the prior fitted-TWFE rank_deficient_action="error" blocker, but two P1 issues remain: one methodology-surface deviation that is not documented as such, and one missing empty-input guard on the new standalone diagnostic.

Executive Summary

Methodology

Code Quality

  • Severity P1. Impact: twowayfeweights() validates columns and aggregates to cells, but it does not guard the empty-frame case before calling _compute_twfe_diagnostic(). Statically, an empty cell frame drives _build_group_time_design() to n_groups == 0, n_times == 0, and therefore n_cols = -1, which can surface as a low-level negative-dimensions error instead of the clear period-count validation that fit() emits later. This is exactly the missing empty-result handling the review checklist calls out for new code paths. Concrete fix: reject cell.empty (or n_groups == 0 or n_times == 0) in twowayfeweights() / _compute_twfe_diagnostic() with a clear ValueError, and keep the message aligned with fit()’s explicit “at least 2 distinct time periods” contract. Refs: diff_diff/chaisemartin_dhaultfoeuille.py:L623-L628, diff_diff/chaisemartin_dhaultfoeuille.py:L1959-L1967, diff_diff/chaisemartin_dhaultfoeuille.py:L2151-L2200

Performance
No findings from static review of the changed files.

Maintainability
No separate findings beyond the methodology-surface documentation issue above.

Tech Debt
No separate findings. Neither blocking issue is tracked under TODO.md:L51-L97, so neither is mitigated by the deferred-work policy.

Security
No findings.

Documentation/Tests

  • Severity P3. Impact: the shared-validation tests for twowayfeweights() cover NaN treatment, NaN outcome, non-binary treatment, and within-cell-varying treatment, but not the empty-frame case, which is why the P1 above is currently unpinned. Concrete fix: add a regression that passes an empty DataFrame with group/period/treatment/outcome columns to twowayfeweights() and asserts a clear ValueError. Refs: tests/test_chaisemartin_dhaultfoeuille.py:L889-L900, tests/test_chaisemartin_dhaultfoeuille.py:L1558-L1647
  • I could not execute the new dCDH tests on this runner because numpy, pandas, and pytest are not installed.

Path to Approval

  1. Resolve the raw-microdata weighting contract. Either:
    • change internal aggregation so uneven group × time cell sizes are weighted the way did_multiplegt_dyn documents for more-disaggregated input, or
    • if the equal-cell rule is intentional, add a labeled **Note (deviation from R DIDmultiplegtDYN):** to docs/methodology/REGISTRY.md, narrow the “matches R at l=1” language in docs/api/chaisemartin_dhaultfoeuille.rst, and add a parity/regression test that exercises unequal cell sizes explicitly.
  2. Add explicit empty-input validation to twowayfeweights() (or _compute_twfe_diagnostic()), so an empty DataFrame with the required columns raises a clear ValueError, and add a regression test for that case.

Two P1 fixes from CI review:

P1 — Cell-count weighting deviation from R now has a labeled
`**Note (deviation from R DIDmultiplegtDYN):**` in REGISTRY.md.
The equal-cell weighting is paper-literal (AER 2020 Theorem 3
defines N_{a,b,t} as cell counts), but R DIDmultiplegtDYN weights
by within-cell observation count on individual-level input. The
two agree on cell-aggregated input where every cell has the same
number of observations (which is how the parity tests are built).
Narrowed the API rst parity language to "on cell-aggregated input."

P1 — twowayfeweights() / _build_group_time_design() now guards
against empty cell DataFrames with a clear ValueError instead of
falling into a low-level negative-dimensions error from n_groups=0.
Added test_twowayfeweights_rejects_empty_input.

Test counts: 113 -> 114. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 81871d4433873b6ffd539b60f95a4837e49753b9


Overall Assessment

⚠️ Needs changes.

Highest unmitigated severity is P1: the new dCDH estimator path still lacks a guard or documented contract for globally missing time periods, so it can silently reinterpret calendar gaps as adjacent periods.

Executive Summary

Methodology

Code Quality

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity P3. Impact: the P1 above is not tracked under the code-review tech-debt table in TODO.md, so the deferred-work policy does not mitigate it. Ref: TODO.md:L51.
    Concrete fix: address the issue in this PR; TODO tracking would not downgrade a P1 missing-assumption-check problem.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Add fit() validation that rejects or explicitly warns on nonconsecutive global periods for numeric/datetime time inputs, instead of silently treating adjacent observed periods as adjacent calendar periods.
  2. Update the dCDH docs to state the even-spacing requirement clearly, alongside the existing ragged-panel note, so users can distinguish supported terminal missingness from unsupported globally missing periods.
  3. Add a regression test that constructs a panel with a shared global gap and asserts the chosen ValueError/warning behavior.

P1: the estimator operates on sorted period indices (matching the AER
2020 paper and R DIDmultiplegtDYN) but this was not documented. A
panel with periods [2000, 2001, 2003] treats 2003 as the immediate
successor of 2001 without warning. Added a **Period-index semantics**
assumption check bullet in REGISTRY.md explaining that the estimator
does not validate or impute globally missing calendar periods, and
advising users with calendar gaps to impute missing periods before
fitting if they want period-over-period differences rather than
gap-spanning differences.

Regression test: test_global_period_gap_treated_as_adjacent constructs
a 4-group panel with periods [0, 1, 3] (global gap at period 2) and
asserts the fit completes with all 4 groups, a finite point estimate,
and populated per-period effects including the gap-spanning transition.

Test counts: 114 -> 115. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 8956d138e5ac219b6b14e6830e5d6ec503480e5c


Overall Assessment

⛔ Blocker. Highest unmitigated severity is P0.

Executive Summary

Methodology

Code Quality

No additional findings beyond the shared-validation defect above.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity P3. Impact: TODO.md does not track the missing-key drop, and even if it did, this rubric does not allow TODO tracking to mitigate a P0 silent-correctness bug. References: TODO.md:L51-L77. Concrete fix: address this in the PR rather than deferring it.

Security

No findings.

Documentation/Tests

  • Severity P2. Impact: the new registry note on global calendar gaps recommends inserting rows with NaN outcomes and letting cell aggregation handle them, but the same implementation explicitly raises on any NaN outcome. That documented workaround is therefore not currently usable. References: docs/methodology/REGISTRY.md:L472-L478, diff_diff/chaisemartin_dhaultfoeuille.py:L169-L181. Concrete fix: either support a missing-period placeholder path consistent with that note, or revise the note to recommend a preprocessing route that the current validator actually accepts.
  • Severity P3. Impact: the validation-test block says it is pinning the shared helper against “drop NaN rows in groupby,” but the actual cases only cover NaN treatment, NaN outcome, non-binary treatment, empty input, and within-cell treatment variation. There is still no regression for NaN group or NaN time, which is the exact path behind the P0 above. References: tests/test_chaisemartin_dhaultfoeuille.py:L1612-L1666, diff_diff/chaisemartin_dhaultfoeuille.py:L153-L199. Concrete fix: add explicit regressions for missing group and missing time on both fit() and twowayfeweights(), and assert the new error contract.

Review limitation: I could not run the tests here because numpy, pandas, and pytest are unavailable in this runner.

Path to Approval

  1. Add explicit missing-value validation for group and time in _validate_and_aggregate_to_cells() before the groupby, with ValueError messages parallel to the existing treatment / outcome guards.
  2. Add regressions covering NaN group and NaN time for both public entry points: ChaisemartinDHaultfoeuille.fit() and twowayfeweights().
  3. Fix the registry note on global calendar gaps so its recommended workaround matches actual validator behavior, or implement the missing-period placeholder path it currently describes.

P0: groupby silently drops NaN keys, so NaN in the group or time
columns would silently shrink the estimation sample without warning.
Added explicit NaN checks for both columns in
_validate_and_aggregate_to_cells() before the groupby call, parallel
to the existing treatment/outcome NaN validators. Both fit() and
twowayfeweights() share the validation.

Four regression tests added: test_twowayfeweights_rejects_nan_group,
test_twowayfeweights_rejects_nan_time, test_fit_rejects_nan_group,
test_fit_rejects_nan_time.

Also fixed the REGISTRY period-index-semantics workaround text that
previously suggested inserting rows with NaN outcome for missing
calendar periods. That would fail the outcome-NaN validator. Updated
to recommend finite imputed outcomes instead.

Test counts: 115 -> 119. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 639f5dcc90b26cc8dad00d3c08b4257092188907


Overall Assessment

⚠️ Needs changes — highest unmitigated severity is P1.

Executive Summary

Methodology

The main estimator path is in reasonable shape for a re-review: AER 2020’s sharp-design DID_{+,t} / DID_{-,t} / aggregate WT C definitions and the dynamic paper’s DID_ℓ / cohort-recentered CI machinery are the right source anchors for this Phase 1 implementation. The meaningful departures I spot-checked are registry-noted rather than silent. (nber.org)

  • Severity P1. Impact: twowayfeweights() and fitted results.twfe_sigma_fe are presented as the Corollary 1 sign-flip threshold, but [compute_twfe_diagnostic()](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille.py#L2018) computes sigma_fefrom the already-multiplied contribution weightsN{g,t} ε_{g,t} / Σ N_{g,t} ε_{g,t}viasqrt(sum_treated w^2 - mean(w)^2 * n_treated)in [diff_diff/chaisemartin_dhaultfoeuille.py:L2025-L2100](/home/runner/work/diff-diff/diff-diff/diff_diff/chaisemartin_dhaultfoeuille.py#L2025). AER Theorem 1 / Corollary 1 instead defineβ_feusing(N_{g,t}/N_1) w_{g,t}but defineσ(w)from the paper’sw_{g,t}itself, weighted byN_{g,t}/N_1. Those are different objects; even on balanced panels the current denominator is smaller by a sqrt(n_treated)factor, sosigma_feis overstated and makes TWFE look more robust to heterogeneity than the paper supports. Concrete fix: compute the paper’sw_{g,t}explicitly (or recover it from the contribution weights by dividing byN_{g,t}/N_1), then compute sigma_fe = abs(beta_fe) / sqrt(sum((N_{g,t}/N_1) * (w_{g,t} - 1)^2))`, and keep the stored per-cell contribution weights separate if you still want to expose them. (nber.org)
  • Severity P3. Impact: the estimator intentionally equal-weights aggregated (group,time) cells after aggregation, and that behavior is explicit in code and tests at diff_diff/chaisemartin_dhaultfoeuille.py:L1425-L1448, docs/methodology/REGISTRY.md:L493-L509, and tests/test_chaisemartin_dhaultfoeuille.py:L733-L813. Under this rubric that makes it informational, not blocking. The issue is that the registry/test language currently calls this “paper-literal,” while the AER paper defines N_{d,d',t} and N_{d,d',d'',t} as sums of N_{g,t} observation counts. Concrete fix: reword the note/test to describe equal-cell weighting as an intentional library deviation on uneven cell sizes, not as literal paper fidelity. (nber.org)

Code Quality

No additional findings. The main re-review regression from the prior AI review is fixed in diff_diff/chaisemartin_dhaultfoeuille.py:L153-L169.

Performance

No findings.

Maintainability

No findings beyond the test gap noted below.

Tech Debt

No mitigating TODO.md entry covers the P1 sigma_fe issue, and under this rubric it would not downgrade that finding anyway.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Fix sigma_fe so twowayfeweights() and fitted results.twfe_sigma_fe use the AER Corollary 1 w_{g,t} / N_{g,t}/N_1 formula, rather than the current contribution-weight surrogate.
  2. Add a regression test with a hand-checkable TWFE example that asserts the reported per-cell weights and sigma_fe numerically; the current tests only check that the fields are present.

…itation

Three fixes from CI review:

P1 — sigma_fe now uses observation-share-weighted variance of the
per-cell TWFE weights, matching AER Corollary 1. The previous code
computed an unweighted variance across treated cells, which
understates sigma_fe (reports a threshold that's too optimistic).
The fix uses shares s_{g,t} = N_{g,t} / N_1 where N_1 is total
treated observations, then computes:
  var_w = sum(s * (w - w_bar)^2)
  sigma_fe = |beta_fe| / sqrt(var_w)
On equal-cell-size panels (the common case and the parity-test
DGPs), the two formulas produce identical results.

P3 — Softened "paper-literal" wording for cell-count weighting in
REGISTRY.md. The AER paper's N_{a,b,t} notation can be read as
observation sums or cell counts. The equal-cell interpretation is
the one we implement, and the deviation from R is already labeled.

P3 — Fixed placebo citation from "Theorem 4 of AER 2020" to the
correct reference. The reviewer (citing the published AER paper and
NBER WP) notes that the placebo specification is part of the same
section as the DID_M definition, not a separate Theorem 4 (which
covers fuzzy designs). Removed the incorrect Theorem 4 number from
all 8 affected surfaces (source code, results dataclass, REGISTRY,
CHANGELOG, ROADMAP, API rst, llms-full.txt, test docstring).

119 tests still pass. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 62b0f93777e232c2c0c673e5b4b218ca2f97de3d


Overall Assessment

⚠️ Needs changes — highest unmitigated severity is P1.

This re-review is down to one remaining blocker: the TWFE diagnostic still does not compute the Corollary 1 sigma_fe quantity it claims to expose in diff_diff/chaisemartin_dhaultfoeuille.py:L2049-L2111. citeturn4view0turn3view2

Executive Summary

Methodology

  • Severity P1
    Impact: _compute_twfe_diagnostic() in diff_diff/chaisemartin_dhaultfoeuille.py:L2049-L2111 computes w_gt = (N_{g,t} * eps_{g,t}) / denom and then uses the treated-share-weighted variance of that object around its weighted mean for sigma_fe. But Theorem 1 / Corollary 1 separate the treated-cell contribution weights used in the TWFE decomposition from the paper’s w_{g,t} object used inside sigma(w), which is centered at 1 under treated-cell weighting. So results.twfe_sigma_fe and twowayfeweights().sigma_fe are not the sign-flip threshold the code and docs claim to expose. I did not find a mitigating Note/Deviation for this in docs/methodology/REGISTRY.md:L566-L588 or a tracking entry in TODO.md:L51-L77. citeturn4view0turn3view2
    Concrete fix: keep the exported per-cell decomposition weights if desired, but compute sigma_fe from the paper’s w_{g,t} object. Concretely, use treated observation shares s_{g,t} = N_{g,t} / N_1, form w_paper = eps / sum_treated(s * eps) on treated cells (equivalently w_paper = contribution_weight / s), then compute sigma_fe = abs(beta_fe) / sqrt(sum_treated(s * (w_paper - 1)**2)). If the public weights column remains the contribution-weight object, rename or document it accordingly so it is not conflated with Corollary 1’s w_{g,t}.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No deferrable findings. TODO.md does not track the sigma_fe issue, so it remains unmitigated under the stated rubric.

Security

No findings.

Documentation/Tests

  • Severity P2
    Impact: the TWFE methodology test in tests/test_methodology_chaisemartin_dhaultfoeuille.py:L420-L446 only checks that beta_fe, fraction_negative, sigma_fe, and weights are present/nonzero. That would not catch the current sigma_fe formula mismatch, nor any future confusion between exposed contribution weights and the paper’s w_{g,t} object.
    Concrete fix: add a small hand-checkable TWFE panel and assert beta_fe, the exposed weight column, and sigma_fe numerically. The regression should compute the expected value from the paper formula directly, not just check non-nullness.

Path to Approval

  1. Fix sigma_fe so the diagnostic uses the Corollary 1 w_{g,t} / treated-share formula, rather than the current contribution-weight surrogate.
  2. Add a numeric regression test for the TWFE diagnostic that would fail on the current implementation.
  3. Update the TWFE diagnostic docstring/API wording if the exported weights field continues to represent contribution weights rather than Corollary 1’s w_{g,t}.

P1: Round 17's sigma_fe fix still used the contribution weights
w_gt = s * w_paper as if they were the paper's w_{g,t} object. The
paper's Corollary 1 defines w_{g,t} = eps / E_treated[eps], which
does NOT include the observation-share factor in the numerator. The
contribution weights (which include the share factor) are the right
object for the TWFE decomposition itself, but sigma_fe needs the
paper's w_{g,t} centered at 1.

The fix computes w_paper separately:
  w_paper = eps_treated / sum(s * eps_treated)
  sigma(w) = sqrt(sum(s * (w_paper - 1)^2))
  sigma_fe = |beta_fe| / sigma(w)

The exported weights column still contains the contribution weights
(the TWFE decomposition object), not the paper weights.

P2: added test_twfe_diagnostic_hand_checkable_sigma_fe in
TestTWFEDiagnostic with a 4-group 3-period staggered panel.
Asserts beta_fe = 3.5, fraction_negative = 0.0, and
sigma_fe = 6.8641 numerically. This test would have caught both
the Round 17 and pre-Round-17 sigma_fe formula errors.

Test counts: 119 -> 120. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: dd52d27c747d1c8372ee19bc76518234459ad5df


Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings in the changed dCDH estimator code. The prior sigma_fe methodology blocker appears resolved in diff_diff/chaisemartin_dhaultfoeuille.py:L2092 and is now pinned by a numeric regression in tests/test_methodology_chaisemartin_dhaultfoeuille.py:L448.

Executive Summary

  • The previous P1 on TWFE sigma_fe looks fixed: the implementation now distinguishes exported contribution weights from the Corollary 1 paper weights used inside sigma_fe, and computes the latter from treated-cell shares and residuals in diff_diff/chaisemartin_dhaultfoeuille.py:L2092.
  • The new hand-checkable TWFE test would have failed under the prior formula, so the fix is now regression-covered in tests/test_methodology_chaisemartin_dhaultfoeuille.py:L448.
  • I did not find a new unmitigated methodology defect in the changed estimator, placebo, bootstrap, A11 handling, or parameter-gating paths.
  • Bootstrap surface propagation is internally consistent now: top-level p-values/CIs are taken from the bootstrap results, while t-stats still come through safe_inference() in diff_diff/chaisemartin_dhaultfoeuille.py:L1133.
  • One remaining issue is documentation-only: the helper docstring and user docs still blur the distinction between exported TWFE decomposition weights and the paper’s Corollary 1 w_{g,t} object used for sigma_fe.
  • I could not run tests in this runner because pytest is not installed.

Methodology

No findings. The re-review blocker on sigma_fe appears addressed by the Corollary 1 implementation in diff_diff/chaisemartin_dhaultfoeuille.py:L2092 and the new numeric test in tests/test_methodology_chaisemartin_dhaultfoeuille.py:L448.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see a new correctness limitation that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

  • Severity P3. Impact: _compute_twfe_diagnostic() still documents the old sigma_fe formula in terms of exported w, even though the code now correctly uses the Corollary 1 paper weights derived from treated-share residuals; the README/API text also only says “per-cell weights,” which leaves the two weight objects easy to conflate. See diff_diff/chaisemartin_dhaultfoeuille.py:L2018, README.md:L1212, and docs/api/chaisemartin_dhaultfoeuille.rst:L148. Concrete fix: update the helper docstring and user-facing docs to say explicitly that weights/twfe_weights are Theorem 1 contribution weights, while sigma_fe is computed from the Corollary 1 paper w_{g,t} object built internally from treated observation shares; renaming the exported column to contribution_weight would remove the ambiguity entirely.

P3: the helper docstring still described the old sigma_fe formula
in terms of the exported contribution weights. Updated to explicitly
distinguish the two weight objects:
- Contribution weights (Theorem 1 decomposition, exported in the
  weights column): cw = N * eps / denom
- Paper weights (Corollary 1, used internally for sigma_fe):
  w_paper = eps / sum(s * eps), centered at 1

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 12, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5b46f407c8da9983725820ee04f83e4ba651fdd7


Overall Assessment

✅ Looks good — no unmitigated P0 or P1 findings in the changed dCDH estimator PR.

Executive Summary

Methodology

No findings. Against the Methodology Registry baseline, I did not find an undocumented estimator, weighting, variance/SE, identification, or default-behavior mismatch in the changed dCDH code. The prior sigma_fe issue remains fixed in chaisemartin_dhaultfoeuille.py#L2100 and regression-covered in test_methodology_chaisemartin_dhaultfoeuille.py#L448.

Code Quality

No findings. New public parameters are propagated consistently across constructor/introspection/wrapper paths in chaisemartin_dhaultfoeuille.py#L356, chaisemartin_dhaultfoeuille.py#L413, and chaisemartin_dhaultfoeuille.py#L2180.

Performance

No findings. The changed estimator and bootstrap paths are vectorized over pre-aggregated matrices, and the bootstrap helper uses batched weight generation in chaisemartin_dhaultfoeuille_bootstrap.py#L290.

Maintainability

No findings. The shared validation/aggregation helper and explicit sample-contract warning reduce drift between fit() and twowayfeweights() in chaisemartin_dhaultfoeuille.py#L94 and chaisemartin_dhaultfoeuille.py#L749.

Tech Debt

No findings. I did not identify a new deferrable item that needs tracking in TODO.md.

Security

No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 12, 2026
The test_rank_deficient_action_error_raises_on_fitted_twfe test
failed on all CI runners because the Rust backend's solve_ols
handles underdetermined systems without raising ValueError (unlike
the Python backend). The test relied on solve_ols raising "Fewer
observations than parameters" for a 1-group panel, which is
backend-specific behavior.

Fix: added back the n_groups < 2 or n_times < 2 guard in
_build_group_time_design (removed in Round 14, needed again now).
This guard fires deterministically on both backends before
solve_ols is called. Updated the test to match the guard's error
message ("at least 2 groups") instead of the backend-specific
solve_ols message.

120 tests still pass locally. Black, ruff clean.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@igerber igerber merged commit 5533a2b into main Apr 12, 2026
19 of 20 checks passed
@igerber igerber deleted the dcdh-estimator branch April 12, 2026 15:03
@igerber igerber mentioned this pull request Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant